2,267 research outputs found

    Visually Mining Interesting Patterns in Multivariate Datasets

    Get PDF
    Data mining for patterns and knowledge discovery in multivariate datasets are very important processes and tasks to help analysts understand the dataset, describe the dataset, and predict unknown data values. However, conventional computer-supported data mining approaches often limit the user from getting involved in the mining process and performing interactions during the pattern discovery. Besides, without the visual representation of the extracted knowledge, the analysts can have difficulty explaining and understanding the patterns. Therefore, instead of directly applying automatic data mining techniques, it is necessary to develop appropriate techniques and visualization systems that allow users to interactively perform knowledge discovery, visually examine the patterns, adjust the parameters, and discover more interesting patterns based on their requirements. In the dissertation, I will discuss different proposed visualization systems to assist analysts in mining patterns and discovering knowledge in multivariate datasets, including the design, implementation, and the evaluation. Three types of different patterns are proposed and discussed, including trends, clusters of subgroups, and local patterns. For trend discovery, the parameter space is visualized to allow the user to visually examine the space and find where good linear patterns exist. For cluster discovery, the user is able to interactively set the query range on a target attribute, and retrieve all the sub-regions that satisfy the user\u27s requirements. The sub-regions that satisfy the same query and are neareach other are grouped and aggregated to form clusters. For local pattern discovery, the patterns for the local sub-region with a focal point and its neighbors are computationally extracted and visually represented. To discover interesting local neighbors, the extracted local patterns are integrated and visually shown to the analysts. Evaluations of the three visualization systems using formal user studies are also performed and discussed

    Model and Integrate Medical Resource Available Times and Relationships in Verifiably Correct Executable Medical Best Practice Guideline Models (Extended Version)

    Full text link
    Improving patient care safety is an ultimate objective for medical cyber-physical systems. A recent study shows that the patients' death rate is significantly reduced by computerizing medical best practice guidelines. Recent data also show that some morbidity and mortality in emergency care are directly caused by delayed or interrupted treatment due to lack of medical resources. However, medical guidelines usually do not provide guidance on medical resource demands and how to manage potential unexpected delays in resource availability. If medical resources are temporarily unavailable, safety properties in existing executable medical guideline models may fail which may cause increased risk to patients under care. The paper presents a separately model and jointly verify (SMJV) architecture to separately model medical resource available times and relationships and jointly verify safety properties of existing medical best practice guideline models with resource models being integrated in. The SMJV architecture allows medical staff to effectively manage medical resource demands and unexpected resource availability delays during emergency care. The separated modeling approach also allows different domain professionals to make independent model modifications, facilitates the management of frequent resource availability changes, and enables resource statechart reuse in multiple medical guideline models. A simplified stroke scenario is used as a case study to investigate the effectiveness and validity of the SMJV architecture. The case study indicates that the SMJV architecture is able to identify unsafe properties caused by unexpected resource delays.Comment: full version, 12 page

    Generalized Hyper-cylinders: a Mechanism for Modeling and Visualizing N-D Objects

    Get PDF
    The display of surfaces and solids has usually been restricted to the domain of scientific visualization; however, little work has been done on the visualization of surfaces and solids of dimensionality higher than three or four. Indeed, most high-dimensional visualization focuses on the display of data points. However, the ability to effectively model and visualize higher dimensional objects such as clusters and patterns would be quite useful in studying their shapes, relationships, and changes over time. In this paper we describe a method for the description, extraction, and visualization of N-dimensional surfaces and solids. The approach is to extend generalized cylinders, an object representation used in geometric modeling and computer vision, to arbitrary dimensionality, resulting in what we term Generalized Hyper-cylinders (GHCs). A basic GHC consists of two N-dimensional hyper-spheres connected by a hyper-cylinder whose shape at any point along the cylinder is determined by interpolating between the endpoint shapes. More complex GHCs involve alternate cross-section shapes and curved spines connecting the ends. Several algorithms for constructing or extracting GHCs from multivariate data sets are proposed. Once extracted, the GHCs can be visualized using a variety of projection techniques and methods toconvey cross-section shapes

    Distributionally Robust Machine Learning with Multi-source Data

    Full text link
    Classical machine learning methods may lead to poor prediction performance when the target distribution differs from the source populations. This paper utilizes data from multiple sources and introduces a group distributionally robust prediction model defined to optimize an adversarial reward about explained variance with respect to a class of target distributions. Compared to classical empirical risk minimization, the proposed robust prediction model improves the prediction accuracy for target populations with distribution shifts. We show that our group distributionally robust prediction model is a weighted average of the source populations' conditional outcome models. We leverage this key identification result to robustify arbitrary machine learning algorithms, including, for example, random forests and neural networks. We devise a novel bias-corrected estimator to estimate the optimal aggregation weight for general machine-learning algorithms and demonstrate its improvement in the convergence rate. Our proposal can be seen as a distributionally robust federated learning approach that is computationally efficient and easy to implement using arbitrary machine learning base algorithms, satisfies some privacy constraints, and has a nice interpretation of different sources' importance for predicting a given target covariate distribution. We demonstrate the performance of our proposed group distributionally robust method on simulated and real data with random forests and neural networks as base-learning algorithms
    • …
    corecore